本文介绍了Cool-MC,这是一种集成了最先进的加固学习(RL)和模型检查的工具。具体而言,该工具建立在OpenAI健身房和概率模型检查器风暴上。COOL-MC提供以下功能:(1)模拟器在OpenAI体育馆训练RL政策,用于Markov决策过程(MDPS),这些模拟器定义为暴风雨的输入,(2)使用“ SORM”的新型号构建器,用于使用回调功能要验证(神经网络)RL策略,(3)与OpenAI Gym或Storm中指定的模型和政策相关的正式抽象,以及(4)算法以获得有关所谓允许政策的性能的界限。我们描述了Cool-MC的组件和体系结构,并在多个基准环境中演示了其功能。
translated by 谷歌翻译
安全探索是强化学习(RL)的常见问题,旨在防止代理在探索环境时做出灾难性的决定。一个解决这个问题的方法家庭以这种环境的(部分)模型的形式假设域知识,以决定动作的安全性。所谓的盾牌迫使RL代理只选择安全的动作。但是,要在各种应用中采用,必须超越执行安全性,还必须确保RL的适用性良好。我们通过与最先进的深度RL的紧密整合扩展了盾牌的适用性,并在部分可观察性下提供了充满挑战的,稀疏的奖励环境中的广泛实证研究。我们表明,经过精心整合的盾牌可确保安全性,并可以提高RL代理的收敛速度和最终性能。我们此外表明,可以使用盾牌来引导最先进的RL代理:它们在屏蔽环境中初步学习后保持安全,从而使我们最终可以禁用潜在的过于保守的盾牌。
translated by 谷歌翻译
基于仿真的自主车辆(AVS)测试已成为道路测试的必要补充,以确保安全。因此,实质性研究专注于寻找模拟中的失败情景。但是,仍然存在一个基本问题:是在实际情况下模拟中识别的AV失败情景,即它们在真实系统上可重复?由于模拟和实际传感器数据之间的差异引起的SIM-to-实际间隙,模拟中识别的故障场景可以是合成传感器数据的虚假工件或持续存在具有实际传感器数据的实际故障。验证模拟故障方案的方法是在真实数据的语料库中识别场景的情况,并检查故障是否持续存在于实际数据上。为此,我们提出了一个正式的定义,它对标记的数据项匹配抽象场景的方法,以使用风景概率编程语言编码为场景程序。使用此定义,我们开发了一个查询算法,给定场景程序和标记的数据集,找到符合场景的数据子集。实验表明,我们的算法在各种现实的交通方案上是准确和高效的,并缩放到合理数量的代理商。
translated by 谷歌翻译
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
translated by 谷歌翻译
The analysis of network structure is essential to many scientific areas, ranging from biology to sociology. As the computational task of clustering these networks into partitions, i.e., solving the community detection problem, is generally NP-hard, heuristic solutions are indispensable. The exploration of expedient heuristics has led to the development of particularly promising approaches in the emerging technology of quantum computing. Motivated by the substantial hardware demands for all established quantum community detection approaches, we introduce a novel QUBO based approach that only needs number-of-nodes many qubits and is represented by a QUBO-matrix as sparse as the input graph's adjacency matrix. The substantial improvement on the sparsity of the QUBO-matrix, which is typically very dense in related work, is achieved through the novel concept of separation-nodes. Instead of assigning every node to a community directly, this approach relies on the identification of a separation-node set, which -- upon its removal from the graph -- yields a set of connected components, representing the core components of the communities. Employing a greedy heuristic to assign the nodes from the separation-node sets to the identified community cores, subsequent experimental results yield a proof of concept. This work hence displays a promising approach to NISQ ready quantum community detection, catalyzing the application of quantum computers for the network structure analysis of large scale, real world problem instances.
translated by 谷歌翻译
The following article presents a memetic algorithm with applying deep reinforcement learning (DRL) for solving practically oriented dual resource constrained flexible job shop scheduling problems (DRC-FJSSP). In recent years, there has been extensive research on DRL techniques, but without considering realistic, flexible and human-centered shopfloors. A research gap can be identified in the context of make-to-order oriented discontinuous manufacturing as it is often represented in medium-size companies with high service levels. From practical industry projects in this domain, we recognize requirements to depict flexible machines, human workers and capabilities, setup and processing operations, material arrival times, complex job paths with parallel tasks for bill of material (BOM) manufacturing, sequence-depended setup times and (partially) automated tasks. On the other hand, intensive research has been done on metaheuristics in the context of DRC-FJSSP. However, there is a lack of suitable and generic scheduling methods that can be holistically applied in sociotechnical production and assembly processes. In this paper, we first formulate an extended DRC-FJSSP induced by the practical requirements mentioned. Then we present our proposed hybrid framework with parallel computing for multicriteria optimization. Through numerical experiments with real-world data, we confirm that the framework generates feasible schedules efficiently and reliably. Utilizing DRL instead of random operations leads to better results and outperforms traditional approaches.
translated by 谷歌翻译
The acquisition of high-quality human annotations through crowdsourcing platforms like Amazon Mechanical Turk (MTurk) is more challenging than expected. The annotation quality might be affected by various aspects like annotation instructions, Human Intelligence Task (HIT) design, and wages paid to annotators, etc. To avoid potentially low-quality annotations which could mislead the evaluation of automatic summarization system outputs, we investigate the recruitment of high-quality MTurk workers via a three-step qualification pipeline. We show that we can successfully filter out bad workers before they carry out the evaluations and obtain high-quality annotations while optimizing the use of resources. This paper can serve as basis for the recruitment of qualified annotators in other challenging annotation tasks.
translated by 谷歌翻译
We present NusaCrowd, a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have has brought together 137 datasets and 117 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their effectiveness has been demonstrated in multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and its local languages. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and its local languages. Our work is intended to help advance natural language processing research in under-represented languages.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
State-of-the-art language models are often accurate on many question-answering benchmarks with well-defined questions. Yet, in real settings questions are often unanswerable without asking the user for clarifying information. We show that current SotA models often do not ask the user for clarification when presented with imprecise questions and instead provide incorrect answers or "hallucinate". To address this, we introduce CLAM, a framework that first uses the model to detect ambiguous questions, and if an ambiguous question is detected, prompts the model to ask the user for clarification. Furthermore, we show how to construct a scalable and cost-effective automatic evaluation protocol using an oracle language model with privileged information to provide clarifying information. We show that our method achieves a 20.15 percentage point accuracy improvement over SotA on a novel ambiguous question-answering answering data set derived from TriviaQA.
translated by 谷歌翻译